Biostat 200B Homework 5
Due Feb 19 @ 11:59PM
Question A.1
For the regression of logdocper on logpopdens, bedp1000, hsgrad, poverty, unemp, pcinck, provide an interpretation of each of the regression coefficients.
Answer:
The SAS codes are as follows:
Intercept = -2.98003 and the interpretation is: when logpopdens, bedp1000, hsgrad, poverty, unemp, and pcinck are all zero (which are not reasonable), the estimated mean number of active physicians (in 1990) per 1000 people is \(e^{-2.98003}\) = 0.0508.
Coefficient for logpopdens = 0.04832 and the interpretation is: for every 1% increase in population density, the estimated mean number of active physicians (in 1990) per 1000 people will increase by 0.04832%, controlling for all other independent variables.
Coefficient for bedp1000 = 0.15378 and the interpretation is: the effect of every one bed increase in hospital beds (in 1990) per 1000 people would be to multiply the estimated mean number of active physicians (in 1990) per 1000 people by \(e^{0.15378}\) = 1.166234, i.e., a 16.6% increase, controlling for all other independent variables.
Coefficient for hsgrad = 0.01711 and the interpretation is: the effect of every one percent increase in percent of adult population (25 years and older) who completed 12 or more years of school would be to multiply the estimated mean number of active physicians (in 1990) per 1000 people by \(e^{0.01711}\) = 1.017257, i.e., a 1.7% increase, controlling for all other independent variables.
Coefficient for poverty = 0.04025 and the interpretation is: the effect of every one percent increase in percent of 1990 population with income below poverty level would be to multiply the estimated mean number of active physicians (in 1990) per 1000 people by \(e^{0.04025}\) = 1.041071, i.e., a 4.1% increase, controlling for all other independent variables.
Coefficient for unemp = -0.02622 and the interpretation is: the effect of every one percent increase in percent of 1990 labor force that was unemployed would be to multiply the estimated mean number of active physicians (in 1990) per 1000 people by \(e^{-0.02622}\) = 0.9741208, i.e., a 2.6% decrease, controlling for all other independent variables.
Coefficient for pcinck = 0.06515 and the interpretation is: the effect of every one thousand dollars increase in per capita income in 1990 would be to multiply the estimated mean number of active physicians (in 1990) per 1000 people by \(e^{0.06515}\) = 1.067319, i.e., a 6.7% increase, controlling for all other independent variables.
Question A.2
For the predictor pcinck, verify that the SE of the regression coef is equal to the formula involving R^2_j given in lecture, ie, find each of the quantities in the SE and calculate the SE, verifying it is equal to the SE given in the SAS output for the regression.
Answer:
The SAS codes are as follows:
From the SAS output, we have \(\hat{\sigma}^2 = 0.30610^2=0.0937\), \(s^{2}_{j} = 4.05919^2= 16.47704\), and \(VIF_j = 2.38561\). Therefore, the variance of the regression coefficient is: \(2.38561 \times \frac{0.0937}{439\times16.47704} = 3.09017*10^{-5}\).
The standard error of the regression coefficient is the square root of the variance, which is \(0.00556\). The standard error of the regression coefficient given in the SAS output is \(0.00556\), which is equal to the standard error calculated using the formula involving \(R^2_j\).
Question A.3
In general, would you expect VIFs to increase or decrease as more predictors are added to a model? Provide a justification for your answer. Illustrate by providing some example regressions using the CDI dataset.
Answer:
In general, we can expect VIFs to increase (or at least not decrease) as more predictors are added to a model. The VIF for a predictor measures the extent to which the variance of the regression coefficient for that predictor is inflated due to multicollinearity. When more predictors are added to a model, the VIFs will increase because the variance of the regression coefficient for each predictor will be inflated due to the presence of more correlated predictors. When more predictors are added to a model, the probability of the variance of each predictor that can be explained by the other predictors will increase, giving rise to an increase in the VIFs.The illustration is as follows:
In the beginning, we have a model with only one predictor hsgrad, and the VIF is 1. Then we add one more predictor to the model logpopdens, and the VIFs increase only a little to 1.01347. From the Pearson correlation, they have small correlation. And then we add one more predictor to the model bagrad, and the VIFs increase a lot to 2.09732. bagrad and hsgrad have high correlation. The VIFs increase as we add more predictors to a model.
Question B.1
Answer:
Question B.2
Answer:
Question B.3
Answer:
Question B.4
Answer: